{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# aitextgen Generation Hello World\n",
    "\n",
    "by Max Woolf\n",
    "\n",
    "A \"Hello World\" Tutorial to show how generation works with aitextgen!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aitextgen import aitextgen"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Without any parameters, `aitextgen()` will download, cache, and load the 124M \"small\" form of GPT-2 (~500MB on disk) to `/aitextgen`; good for prototyping generation.\n",
    "\n",
    "You can change the directory the model is saved/loaded from by setting the `cache_dir` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen.aitextgen:Downloading gpt2 model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2a4e7d7f9a5a4dcbbe4f95fec9cde7c0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=554.0, style=ProgressStyle(description_…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6ba5dd224b7c41d995dc7d73169adf94",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=548118077.0, style=ProgressStyle(descri…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen.aitextgen:Using the default GPT-2 Tokenizer.\n"
     ]
    }
   ],
   "source": [
    "ai = aitextgen()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "SARUNANA SINGAPORE\n",
      "\n",
      "The Sri Lankan Navy conducted its fourth round of combat maneuvers for the first time in its 20 year training history on Thursday.\n",
      "\n",
      "The Sea Southerners' four-speed patrol was part of the naval exercise \"Operation Sustainability,\" which is a collaborative effort with the U.S. Naval Reserve, the Navy and the South Korea Maritime Self Defense Force.\n",
      "\n",
      "The patrol, which was the result of a joint joint training exercise that resulted in over 7,000 American sailors and military aircraft, took place on the South Korean coast of North and South Korea, said Vice Admiral James M. \"Skip\" Pincap for the Navy, which has been involved in exercises in the North and South Korean seas for about 70 years.\n",
      "\n",
      "The U.S. Navy's current naval exercise, which spans four months, commenced last year. That's in response to a joint military exercise of the U.S. Navy\n"
     ]
    }
   ],
   "source": [
    "ai.generate()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can generate multiple texts at a time with the `n` parameter. You can also control the length of the generated text with the `max_length` parameter (default 200 tokens)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AUSTIN -- In March, Travis County prosecutors said that their investigation into a fatal car-crash accident in a neighborhood near Austin was being compromised because they were using an incomplete and unverified criminal history.\n",
      "\n",
      "Prosecutors said they learned that Tilden County DA Greg Saylor and district attorney Chris D'Amico are two of four Texas officials on the case that were implicated in the case. According to a search warrant obtained by The Austin American-Statesman, the warrant reads:\n",
      "==========\n",
      "In 2008, the city had a new water tower. It was located on the corner of North and State avenues in the city center. This tower was about an inch tall, and had about five stories in the middle. The walls were dark brick with a concrete base, white flooring, and thick slabs of brick that filled the surrounding building.\n",
      "\n",
      "This building was designed by the City.\n",
      "\n",
      "The new housing did not last long…\n",
      "\n",
      "On April 18, 2009, a\n",
      "==========\n",
      "In recent years there's been a significant escalation in the fight against Ebola that can result in a serious illness or death, and several countries have begun to seek out and treat such people.\n",
      "\n",
      "The United States and others have begun to see and track how they've been treated by WHO's two \"Ebola Surveillance teams,\" where they monitor, evaluate and respond to the cases and determine who are the most likely to be infected in their local communities.\n",
      "\n",
      "But there's also been much\n"
     ]
    }
   ],
   "source": [
    "ai.generate(n=3, max_length=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can optionally seed the input text with a `prompt`. If you do, the prompt will be bolded in the console/notebook output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mI believe in unicorns because\u001b[0m of their ability to read, write and recognize syllables. They have also become the source of modern computer science, thanks to the power of words like, say, \"This is a monkey\"; or \"This is a tree.\"\n",
      "\n",
      "Now some will point fingers at Apple, which says that \"C-word\" is more like a dictionary than a computer program. They are correct, that Apple's product language, known as Apple II, is for words. They\n",
      "==========\n",
      "\u001b[1mI believe in unicorns because\u001b[0m of their good, bad, and ugly qualities. I believe that unicorns are the kind that have good, bad, and ugly qualities. I believe that unicorns have high moral and ethical standards because they're kindhearted.\"\n",
      "\n",
      "The group's first act will be to show that a group's character is worth celebrating because it's kind, gentle, and not cruel.\n",
      "\n",
      "\"By encouraging good behavior, to show we can get things done, to tell us\n",
      "==========\n",
      "\u001b[1mI believe in unicorns because\u001b[0m there are thousands of them.\n",
      "\n",
      "A few weeks ago I saw people get frustrated when I told them about how long it takes to get from point A to point B, as my first priority was getting those same people where they actually belong. Why isn't there something more to accomplish before those first six days, or even half a day after these first six days of making such a conscious effort? I want my family and friends to know that I are not here today\n"
     ]
    }
   ],
   "source": [
    "ai.generate(n=3, max_length=100, prompt=\"I believe in unicorns because\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, you can generate texts in bulk using `generate_to_file()` and save them to file, allowing you search through them and curate them!\n",
    "\n",
    "You can also change the `temperature` of the text to make it less crazy...or more crazy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Generating 10 texts to ATG_20200506_035322_13582062.txt\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "08055273e46f4983a8042683880171ff",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "ai.generate_to_file(n=10,\n",
    "                    prompt=\"I believe in unicorns because\",\n",
    "                    max_length=100, temperature=1.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MIT License\n",
    "\n",
    "Copyright (c) 2020 Max Woolf\n",
    "\n",
    "Permission is hereby granted, free of charge, to any person obtaining a copy\n",
    "of this software and associated documentation files (the \"Software\"), to deal\n",
    "in the Software without restriction, including without limitation the rights\n",
    "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n",
    "copies of the Software, and to permit persons to whom the Software is\n",
    "furnished to do so, subject to the following conditions:\n",
    "\n",
    "The above copyright notice and this permission notice shall be included in all\n",
    "copies or substantial portions of the Software.\n",
    "\n",
    "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n",
    "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n",
    "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n",
    "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n",
    "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n",
    "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n",
    "SOFTWARE."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.5 64-bit",
   "language": "python",
   "name": "python37564bitb9ff4e3157b244a896f88d1e5f3eb324"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
